Compressed histograms with arbitrary bucket layouts for selectivity estimation
نویسندگان
چکیده
Recent multidimensional histogram techniques such as GenHist and STHoles use an arbitrary bucket layout. This layout has the advantage of requiring a smaller number of buckets to model tuple densities than those required by the traditional grid or recursive layouts. However, the arbitrary bucket layout brings an inherent disadvantage of requiring more memory to store each bucket location information. This diminishes the advantage of requiring fewer buckets and, therefore, has an adverse effect on the resulting selectivity estimation accuracy. To our knowledge, however, no existing histogram-based technique with arbitrary layout addresses this issue. In this paper, we introduce the idea of bucket location compression and then demonstrate its effectiveness for improving selectivity estimation accuracy by proposing the STHoles+ technique. STHoles+ extends STHoles by quantizing each coordinate of a bucket relative to the coordinate of the smallest enclosing bucket. This quantization increases the number of histogram buckets that can be stored in the histogram. Our quantization scheme allows STHoles+ to trade precision of histogram bucket locations for storing more buckets. Experimental results show that STHoles+ outperforms STHoles on various data distributions, query distributions, and other factors such as available memory size, quantization resolution, and dimensionality of the data space.
منابع مشابه
Piecewise Linear Histograms for Selectivity Estimation
Selectivity estimation of queries is of critical importance to query optimization. In order to get accurate estimations, database management systems must maintain statistics to capture the underlying data distribution. Histograms are extensively used in commercial database systems for this purpose. Most current histogram techniques make the assumption that all values in a single bucket appear w...
متن کاملOptimal Histograms with Quality Guarantees
Histograms are commonly used to capture attribute value distribution statistics for query optimizers. More recently, histograms have also been considered as a way to produce quick approximate answers to decision support queries. This widespread interest in histograms motivates the problem of computing his-tograms that are good under a given error metric. In particular, we are interested in an e...
متن کاملA Histogram Utilizing the Cluster Information
Histograms are summary structures of large datasets, which are mainly used for selectivity estimation during query optimization. Selectivity estimation is the fast approximation of query result size. In this paper, we focus on multi-dimensional histograms, especially bidimensional histograms. At the time of selectivity estimation, buckets partially overlapping with a query return approximated r...
متن کاملOptimal Histograms for Hierarchical Range Queries Extended Abstract
Now there is tremendous interest in data warehousing and OLAP applications. OLAP applications typically view data as having multiple logical dimensions (e.g., product, location) with natural hierarchies de ned on each dimension, and analyze the behavior of various measure attributes (e.g., sales, volume) in terms of the dimensions. OLAP queries typically involve hierarchical selections on some ...
متن کاملQuery-Condition-Aware Histograms in Selectivity Estimation Method
The paper shows an adaptive approach to the query selectivity estimation problem for queries with a range selection condition based on continuous attributes. The selectivity factor estimates a size of data satisfying a query condition. This estimation is calculated at the initial stage of the query processing for choosing the optimal query execution plan. A non-parametric estimator of probabili...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Sci.
دوره 177 شماره
صفحات -
تاریخ انتشار 2007